This tutorial is heavily based on the Appendices and Supporting Information found in the Open Access {aniMotum} paper:
Please refer to the paper, appendices and the R package documentation for a comprehensive overview of {aniMotum}.
Please download the {aniMotum} package from R-Universe.
# install from my R-universe repository
install.packages("aniMotum",
repos = c("https://cloud.r-project.org",
"https://ianjonsen.r-universe.dev"),
dependencies = TRUE)
This installs all Imported and Suggested R packages from CRAN and R-universe. If queried, answer Yes to install the source version. Note, if you haven’t installed {aniMotum} previously then installation of dependent packages may take a while, especially if many of them need to be compiled. You should only need to go through this once, subsequent installation of {aniMotum} updates will be much faster.
Alternatively you can download a binary version of aniMotum here: https://ianjonsen.r-universe.dev/aniMotum
For full instructions see the aniMotum homepage on GitHub: https://github.com/ianjonsen/aniMotum
The aim of this practical is to give you an understanding of how we can use the {aniMotum} R package to process and analyse animal movement data. My aim is that you can go on to apply similar workflows to your own animal movement data.
During this practical we will:
This practical will be based on the {tidyverse} style of
R coding. You will become familiar with using pipes
|> to perform data manipulations, using
ggplot to generate publication quality figures, and using
the {sf} package to handle spatial data.
For more information on the {tidyverse} check out the Wickham &
Grolemund book ‘R for Data Science’. You can access it for free online
here:
https://r4ds.had.co.nz
The project website can be accessed here:
https://www.tidyverse.org
For more information on the {sf} project check out https://r-spatial.github.io/sf/
As discussed in the presentation we can pass data from several different tag types to {aniMotum}. For an outline of how {aniMotum} expects this data to be formatted see: https://ianjonsen.github.io/aniMotum/articles/Overview.html
In March 2019 I was part of an expedition to deploy Satellite Relay Data Loggers on harp seal pups in the Gulf of St Lawrence, Canada. These devices are satellite linked computers capable of recording a wide range of information and transmitting it via the ARGOS network. Here we will focus on the location data recorded by the ARGOS satellites when communicating with three of the tags.
Harp seals give birth on the sea ice when it is at it’s most southerly extent in late Winter/ early Spring. Females wean their pup after around 2 weeks. We deployed tags on pups that were approximately 3 weeks old, and tracked their movements as they left the sea ice and began their first migration north.
This work was published in Royal Society Open Science and is available Open Access here:
Data for 3 animals are available in the repo. Lets load them into R and have a quick look.
# load libraries
library(aniMotum)
library(patchwork)
library(sf)
library(tidyverse)
# load harp seal locations
locs <- read_csv("harps.csv")
locs
## # A tibble: 10,884 × 8
## id date lc lon lat smaj smin eor
## <chr> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 hp6-747-19 2019-03-23 00:08:33 3 -59.6 48.1 269 93 97
## 2 hp6-747-19 2019-03-23 00:25:33 3 -59.6 48.1 338 75 69
## 3 hp6-747-19 2019-03-23 00:37:38 1 -59.6 48.1 4984 144 83
## 4 hp6-747-19 2019-03-23 10:20:14 3 -59.7 48.2 358 89 116
## 5 hp6-747-19 2019-03-23 10:22:18 3 -59.7 48.2 239 82 96
## 6 hp6-747-19 2019-03-23 15:07:47 2 -59.7 48.2 1345 76 132
## 7 hp6-747-19 2019-03-23 15:36:07 3 -59.7 48.2 665 48 90
## 8 hp6-747-19 2019-03-23 20:03:42 B -59.7 48.2 4369 528 42
## 9 hp6-747-19 2019-03-23 20:13:27 2 -59.7 48.3 1890 72 84
## 10 hp6-747-19 2019-03-24 01:25:02 3 -59.7 48.3 271 70 82
## # ℹ 10,874 more rows
The harp data are formatted in a pretty standard way, including
Argos’ Least-Squares-based location error classes lc. This
is the minimum required by fit_ssm:
- id a unique identifier for each animal (or track) in the
data.frame or tibble.
- date a date-time variable in the form YYYY-mm-dd HH:MM:SS
(or YYYY/mm/dd HH:MM:SS). These can be text strings, in which case
aniMotum converts them to POSIX format and assumes the timezone is
UTC.
- lc the location class variable common to Argos data, with
classes in the set: 3, 2, 1, 0, A, B, Z.
- lon the longitude variable.
- lat the latitude variable.
The lc values determine the measurement error variances
(based on independent data, see Jonsen et
al. 2020) used in the SSM’s for each observation.
Since 2011, the default Argos location data uses CLS Argos’ Kalman Filter (KF) algorithm. These data include error ellipse information for each observed location in the form of 3 variables: ellipse semi-major axis length, ellipse semi-minor axis length, and ellipse orientation.
The column names follow those for Argos LS data, with the following
additions:
- smaj the Argos error ellipse semi-major axis length
(m).
- smin the Argos error ellipse semi-minor axis length
(m).
- eor the Argos error ellipse ellipse orientation (degrees
from N).
Here, the error ellipse parameters for each observation define the measurement error variances used in the SSM’s (Jonsen et al. 2020). Missing error ellipse values are allowed, in which case, those observations are treated as Argos LS data.
# preliminary plot
p1 <- ggplot() +
theme_bw() +
geom_point(aes(x = lon, y = lat), data = locs) +
facet_wrap(~id)
print(p1)
The plot reveals one track has large gaps while the others have (at least one) obviously erroneous location estimates.
Let’s check how frequently the tags were transmitting. This can guide the frequency of regularised locations estimated by {aniMotum}.
locs |>
group_by(id) |>
summarise(mean_interval_hrs = as.numeric(mean(difftime(date, lag(date)), na.rm = T), "hours"),
max_interval_hrs = as.numeric(max(difftime(date, lag(date)), na.rm = T), "hours"))
## # A tibble: 3 × 3
## id mean_interval_hrs max_interval_hrs
## <chr> <dbl> <dbl>
## 1 hp6-747-19 2.97 303.
## 2 hp6-749-19 0.936 15.0
## 3 hp6-756-19 1.17 54.3
On average two of the tags were transmitting hourly, with the first tag only transmitting every 3 hours. This difference is probably driven by the large gaps in transmission: 303 hours is 12.5 days!
We can use {aniMotum} to regularise and error correct these movement paths using a state-space model. For speed/ simplicity we’ll ask for daily locations, but you can easily adjust this depending on your questions. For example, in the paper I dropped the data with large gaps and then used a 6 hour interval to match the locations to the 6 hour dive summary data transmitted by the tags.
We can fit to all animals in the same call using the
fit_ssm function. Here I’m fitting a correlated random walk
with model = "crw" other options are random walk
(model = "rw") and move persistence
(model = "mp").
fit <- fit_ssm(locs,
vmax = 5,
model = "crw",
time.step = 24,
control = ssm_control(verbose = 0))
You can access the results and summary information about the model
fit with the summary function:
summary(fit)
## Animal id Model Time n.obs n.filt n.fit n.pred n.rr converged AICc
## hp6-747-19 crw 24 1971 583 1388 245 . TRUE 22638.1
## hp6-749-19 crw 24 5073 1339 3734 199 . TRUE 49318.5
## hp6-756-19 crw 24 3840 1027 2813 189 . TRUE 42544.1
##
## --------------
## hp6-747-19
## --------------
## Parameter Estimate Std.Err
## D_x 1.8448 0.176
## D_y 4.5803 0.5023
## rho_p -0.1793 0.0848
## psi 8.9861 0.3696
##
## --------------
## hp6-749-19
## --------------
## Parameter Estimate Std.Err
## D_x 3.3592 0.1955
## D_y 5.3051 0.3263
## rho_p 0.0706 0.0491
## psi 7.6369 0.1835
##
## --------------
## hp6-756-19
## --------------
## Parameter Estimate Std.Err
## D_x 5.5716 0.3673
## D_y 23.8166 1.5672
## rho_p -0.1603 0.07
## psi 14.8114 0.438
We can quickly check the model fit via a 1-D time series plot of the model:
plot(fit,
what = "predicted",
type = 1,
pages = 1)
The fitted points (orange) are overlayed on top of the observations (blue). Note the uncertainty increases when interpolating across the gaps we highlighted earlier. You would need to carefully consider whether to include these interpolated sections in any further analysis.
A 2-D plot of the model:
plot(fit,
what = "predicted",
type = 2,
pages = 1,
ncol = 3)
Again the 2D plot highlights the uncertainty in hp6-747-19’s path.
We can map the predicted locations straight from the
fit_ssm object as follows:
aniMotum::map(fit,
what = "predicted")
## using map scale: 10
Often when validating models we want to assess residual plots.
{aniMotum} offers the option of calculating one-step-ahead prediction
residuals. These can be helpful but are computationally demanding. We
calculate the residuals via the osar function and can
visualise them as a time-series, QQ plot and autocorrelation plot.
res <- osar(fit)
plot(res, type = "ts", pages = 1) | plot(res, type = "qq", pages = 1) | plot(res, type = "acf", pages = 1)
The time series and autocorrelation plots suggest that the
crw model is a good fit to the data, although there is some
skew in the QQ plot. We could re-run the models as rw or
mp and see how this impacts model fit.
We may be interested in inferring how an individuals behaviour changes along the movement path. Move persistence is a continuous behavioural index that captures autocorrelation in both speed and direction. For more information on move persistence please read:
This is a different approach to discrete states estimated using methods such as hidden Markov models. For information on those see previous EFI/ ESA webinars by Théo Michelot and Vianey Leos Barajas.
Move persistence models can be fit either using fit_mpm
or simultaneous to the state-space model fit via the
model = "mp" argument in fit_ssm.
If you fit the model with fit_mpm then there is the
option to estimate move persistence independently for each animal
(model = "mpm"), or sharing a parameter across individuals
(model = "jmpm").
Here we fit a joint move persistence model to the three animals and then plot the results using the default plot function.
fmpm <- fit_mpm(fit,
what = "predicted",
model = "jmpm",
control = mpm_control(verbose = 0))
plot(fmpm,
pages = 1,
ncol = 2)
You can see each path varies between periods of high and low move persistence. This suggests there are changes in the movements indicative of periods of migration and periods of residency. See the Manual Mapping section for an example of mapping the migration and visualising move persistence.
It is also possible to go on to link move persistence to environmental covariates. Take a look at Jonsen et al. (2019) for the method and Grecian et al. (2022) for an application to this data set.
You may want to extract the regularised data output from
fit_ssm to use in a difference application, or you may want
to generate your own maps. You can do this with the grab
function:
# extract predicted locations and their move persistence estimates
df_locs <- fit |> grab("predicted")
df_mpm <- fmpm |> grab("fitted")
# combine
df <- df_locs |> left_join(df_mpm)
## Joining with `by = join_by(id, date)`
df
## # A tibble: 633 × 17
## id date lon lat x y x.se y.se u
## <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 hp6-7… 2019-03-23 00:00:00 -59.6 48.1 -6634. 6088. 1.00e-5 1.00e-5 -6.20e-11
## 2 hp6-7… 2019-03-24 00:00:00 -59.7 48.3 -6642. 6123. 2.28e+0 3.56e+0 8.99e- 2
## 3 hp6-7… 2019-03-25 00:00:00 -59.6 48.2 -6629. 6115. 3.13e+0 4.88e+0 1.21e+ 0
## 4 hp6-7… 2019-03-26 00:00:00 -59.5 48.2 -6621. 6108. 3.33e+0 5.17e+0 2.83e- 1
## 5 hp6-7… 2019-03-27 00:00:00 -59.5 48.2 -6620. 6110. 6.60e-1 8.10e-1 -1.02e- 1
## 6 hp6-7… 2019-03-28 00:00:00 -59.5 48.2 -6620. 6109. 3.37e+0 5.26e+0 -1.98e- 1
## 7 hp6-7… 2019-03-29 00:00:00 -59.4 48.3 -6614. 6133. 1.21e+1 6.52e+0 7.02e- 1
## 8 hp6-7… 2019-03-30 00:00:00 -59.3 48.6 -6597. 6177. 2.96e+0 3.89e+0 7.61e- 1
## 9 hp6-7… 2019-03-31 00:00:00 -59.2 48.6 -6591. 6177. 2.28e+0 3.39e+0 2.74e- 2
## 10 hp6-7… 2019-04-01 00:00:00 -59.1 48.7 -6581. 6192. 8.26e-1 1.03e+0 9.55e- 1
## # ℹ 623 more rows
## # ℹ 8 more variables: v <dbl>, u.se <dbl>, v.se <dbl>, s <dbl>, s.se <lgl>,
## # logit_g <dbl>, logit_g.se <dbl>, g <dbl>
To generate a map we need to load a suitable shapefile from the
{rnaturalearth} package. These can be loaded into your session as an
{sf} object ready to be plotted with geom_sf. I’ve then
created an {sf} object manually from the df object
containing the regularised locations and their corresponding move
persistence estimates.
Given the geographic region, I’ve defined prj as custom
Lambert Azimuthal Equal Area projection centred on the study region. We
specify the projection via the coord_sf function.
#install.packages("rnaturalearth")
require(rnaturalearth)
## Loading required package: rnaturalearth
# Generate a global shapefile and a simple plot
world <- ne_countries(scale = "medium", returnclass = "sf")
# To generate a plot with less distortion first define a projection i.e. Lambert Azimuthal Equal Area
prj = "+proj=laea +lat_0=60 +lon_0=-50 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
# Create an sf version of the locs data with a WGS84 projection and add to the plot
df_sf <- df |>
st_as_sf(coords = c('lon', 'lat')) |>
st_set_crs(4326)
ggplot() +
theme_bw() +
geom_sf(aes(), data = world) +
geom_sf(aes(colour = g), data = df_sf, show.legend = "point") +
scale_color_viridis_c(expression(gamma[t]), limits = c(0,1)) +
coord_sf(xlim = c(-2000000, 2000000), ylim = c(-2500000, 2500000), crs = prj, expand = T) +
scale_x_continuous(breaks = seq(from = -130, to = 30, by = 10)) +
facet_wrap(~id, nrow = 1)
This nicely illustrates the most likely path of the three animals from the pupping ground in the Gulf of St Lawrence as they migrate north. Colouring the points by move persistence (\(\gamma\)t) highlights periods when the animals were moving faster and more directed (lighter colours) and when the animals were travelling slower and less directed (darker).
We can also use {aniMotum} to process animal movement data collected
by GPS devices or Geolocation sensors. To do this we need to specify a
different location class via the lc argument in
fit_ssm.
The following little penguin data set is available in the
supplementary material to the aniMotum paper. Note the lc
column contains the letter G to indicate to {aniMotum} that
this is GPS data.
GPS devices will only be able to fix their location when the penguin
is at the surface, and so unsuprisingly the GPS tracks contain many gaps
when the animals were diving. For a more detailed analysis of the little
penguin data please read the original paper alongside
the analysis in the aniMotum paper. Here we regularise the location data
to 2 minute temporal resolution (fit_ssm expects the input
in hours) and calculate move persistence.
lipe <- read_csv("https://raw.githubusercontent.com/ianjonsen/foieGras.paper/main/data/lipe_gps_ex32.csv")
lipe
## # A tibble: 10,698 × 5
## id date lc lon lat
## <chr> <dttm> <chr> <dbl> <dbl>
## 1 L013m01 2017-10-12 18:00:29 G 150. -36.3
## 2 L013m01 2017-10-12 18:00:39 G 150. -36.3
## 3 L013m01 2017-10-12 18:00:49 G 150. -36.3
## 4 L013m01 2017-10-12 18:00:59 G 150. -36.3
## 5 L013m01 2017-10-12 18:01:09 G 150. -36.3
## 6 L013m01 2017-10-12 18:01:27 G 150. -36.3
## 7 L013m01 2017-10-12 18:01:37 G 150. -36.3
## 8 L013m01 2017-10-12 18:01:55 G 150. -36.3
## 9 L013m01 2017-10-12 18:02:05 G 150. -36.3
## 10 L013m01 2017-10-12 18:02:23 G 150. -36.3
## # ℹ 10,688 more rows
lipe_ssm <- fit_ssm(lipe,
vmax = 5,
min.dt = 5,
model = "crw",
time.step = 2/60,
control = ssm_control(verbose = 0))
lipe_fmpm <- fit_mpm(lipe_ssm,
what = "predicted",
model = "jmpm",
control = mpm_control(verbose = 0))
aniMotum::map(lipe_ssm, what = "predicted") | plot(lipe_fmpm, ncol = 2, pages = 1)
## using map scale: 10
When fitting models in R it’s easy to foget the heavy lifting that these models are often doing behind the scenes.
Just a few years ago, a state-space model fit to animal movement data via MCMC would take hours or even days to run. Now with TMB these models take minutes.
This also means it’s easy to throw the data into the model without doing thorough data checks first.
Most issues can be avoided by checking for unique id’s between
animals, issues with date time format or missing values before passing
any data to fit_ssm.
It’s also important to remember what location class your data have. If they are GPS or
trs <- sim_fit(fit,
what = "predicted",
reps = 5)
plot(trs,
ncol = 3)
plot(fit)
#p2 <- plot(fit, type = 1, what = “predicted”) #p3 <- plot(fit, type = 2, what = “predicted”) #p4 <- aniMotum::map(fit, what = “predicted”)
#fmpm <- fit_mpm(fit)
#xs <- osar(fit) #plot(res, type = “ts”) #plot(res, type = “qq”) #plot(res, type = “acf”)
```
aniMotum can be used to fit a state-space model to Argos
PTT, GLS or GPS data.
Have a look at the help file for fit_ssm.
For Argos PTT data you need to provide either the location class, or the semimajor and semiminor axies of the error…
For GLS data replace the lc column with a string of “GL”
For GPS data replace the lc column with a string of “G”
ideally for geolocation data youwould have standard errors associated with each location, which can be passed as columns to XXX.
If you do not have these data then a good starting place is somewhere between 0.25 and 1 deg SE’s for Latitude and .5 that value for Longitude.
For GPS data you can try a small error variance i.e. 10s of metres
Rework the list of things to do so that they match the introductory slides
Prep data Run fit_ssm
etc
What about other sources of data? Geolocation GPS PTT